Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
yiyixuxu
left a comment
There was a problem hiding this comment.
thanks for the PR, I left one question
|
|
||
| if attention_mask.ndim == 4: | ||
| # NPU does not support automatic broadcasting for this type; the mask must be expanded. | ||
| if attention_mask.device.type == 'npu' and attention_mask.shape[1:3] == (1, 1): |
There was a problem hiding this comment.
can we verify if we explicitly seet the backend to npu, this would also work?
There was a problem hiding this comment.
When a mask of shape [batch, seq_len] or [batch, 1, 1, seq_len] is passed, the operator fails with an error similar to:
get unsupported atten_mask shape, the shape is [B, 1, 1, S] – while only shapes like [B, N, S, S], [B, 1, S, S], [1, 1, S, S], or [S, S] are accepted.
The _native_npu_attention function operates correctly as it leverages _maybe_modify_attn_mask_npu to reshape the attention mask from [batch_size, seq_len_k] to [batch_size, 1, seq_len_q, seq_len_k]. This reshaped format is compatible with the NPU backend.
Reference:
Ascend NPU fusion attention API:
https://www.hiascend.com/document/detail/zh/Pytorch/730/apiref/torchnpuCustomsapi/docs/context/torch_npu-npu_fusion_attention.md
There was a problem hiding this comment.
When a mask of shape [batch, seq_len] or [batch, 1, 1, seq_len] is passed, the operator fails with an error
just want to make sure we're on the same page, could you share a code example that would produce this error on npu? specifically, I;d like to know if you are running the default attention backend, i.e. without wrapping your model call inside with attention_backend("_naive_npu")
There was a problem hiding this comment.
Yes, this code fixes the issue with the "native" backend. After the fix, it runs correctly with the "_native_attention" backend. Here's an example:
python
import torch
import torch_npu
from diffusers import ErnieImagePipeline
from diffusers.utils import load_image
pipe = ErnieImagePipeline.from_pretrained("/model_dir/ERNIE-Image", torch_dtype=torch.bfloat16)
pipe = pipe.to("npu")
generator = torch.Generator(device="npu")
prompt = "A black and white Chinese rural dog"
images = pipe(
prompt=prompt,
height=1024,
width=1024,
num_inference_steps=50,
guidance_scale=5.0,
generator=generator,
use_pe=True,
).images
images[0].save("ernie-image-output.png")
However, I've found that when using native_npu_attention as the backend, there are still some issues with mask handling. I've pushed an additional commit to the previous PR. PR link: #13451 .
Add the following code to the example to enable the _native_npu_attention backend: pipe.transformer.set_attention_backend("_native_npu")
Note: When processing masks, we need to perform expansion validation when 4D masks are passed in, and apply mask inversion to meet the NPU interface requirements.
|
Thank you both for your inputs I looked into this a bit more. I wonder if we can build the mask directly in 2D and not expand to 4D instead? https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_ernie_image.py#L398-L400 This way, I think it would work out-of-box with all our attention backend that support masks. For npu device, it would work using "_native_npu" backend but not the default naive backend - but that's the case with all other models currently @chang-zhijie, Can you help confirm if a 2D mask would work with "_naive_npu" backend? If it's the case, that would be our preferred direction, but if the baidu team prefers an implementation work out of the box with the default backend on npu too, we're happy to support it as well, let us know @HsiaWinter |
What does this PR do?
Fix attention_mask broadcasting for NPU compatibility